116
10
Statistics and Causation
No examination of numbers is complete without the careful consideration of how
the measurements were obtained, encompassing the observational or experimental
setup. Often the shape of a distribution of measurements is crucially able to distin-
guish between different models of how the actual numbers could arise, and sometimes
the extremes of the distribution are of especial importance in making the distinction;
since the numbers are sparse, confidence in the reliability of their values is of especial
importance.
Statistics often focuses on establishing correlations without enquiring into causes.
These are discussed in the next section.
10.2
The Calculus of Causation
Although in Chap. 6 the goal of science was rather dispassionately stated as “gen-
erating conditional information in the form of hypotheses and theories relating the
observed facts to each other using axiom systems” (Sect. 6.1.2), this does not really
capture the enormously strong desire of man to understand the causes of things. As
Max Planck has remarked, 2 “As the law of causality immediately seizes the awaken-
ing soul of the child and causes him indefatigably to ask ‘Why?’ so it accompanies the
investigator through his whole life and incessantly sets him new problems”. Statis-
tics originated in a search for causation, but ended up becoming a tool to establish
correlations between variables, as, essentially, a data-reduction exercise. This view
is epitomized by Karl Pearson’s remark that “data is all there is to science”, and
echoed by R. A. Fisher, who saw statistics as the study of methods of data reduction.
As such, one might even question whether it could generate new knowledge, since
once the structural framework of the procedures and calculations was established,
the rest would be merely a matter of deduction.
Planck’s apothegm echoes Virgil’s felix qui potuit rerum cognoscere causas, and
an important step on the road to getting to grips with causation as something beyond
association and correlation was Sewall Wright’s path analysis. 3
Statistics is rooted in observation, for which probabilistic notation is well suited.
The probability of an event can be established by observing its frequency of occur-
rence. Events can be linked via conditional probability (Sect. 9.2.2). Thus, in agron-
omy, one might ask the question “what is the probability of an xx-fold enhanced
yield (upper YY), given that it rained for the entire month of June?” This can be expressed
as upper P left brace upper Y vertical bar upper R right braceP{Y|R}. Observation might lead to the establishment of a correlation between
crop yield and rainfall upper RR. A similar question, “what is the probability of an xx-fold
enhanced yield, given that the field has been fertilized with gypsum?” might be
addressed in a similar fashion, leading to the establishment of a correlation between
crop yield and fertilizer dose. But clearly fertilization is a human intervention. It was
2 Planck (1932).
3 Wright (1921, 1983), see also Burks (1926), Good (1961), Pearl (1994, 2020). The famous guinea
pig experiments are described in Wright (1920).